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This paper considers the optimal control of time varying con¬ 
tinuous time Markov chains whose transition rates are themselves 
Markov processes. In one set of problems the solution of an ordinary 
differential equation is shown to determine the optimal performance 
and feedback controls, while some other cases are shown to lead to 
singular optimal control problems which are more difficult to solve. 

Solution techniques are demonstrated using examples from finance to 
behavioral decision making. 


1. Introduction. For over five decades the subject of control of Markov 
processes has enjoyed tremendous successes in areas as diverse as manufac¬ 
turing, communications, machine learning, population biology, management 
sciences, clinical systems modelling and even human memory modeling [12]. 
In a Markov decision process (MDP) the transition rates depend upon con¬ 
trols, which can be chosen appropriately so as to achieve a particular opti¬ 
mization goal. The subject of this paper is to explore a class of MDPs where 
the transition rates are, in addition, dependent upon the state of another 
stochastic processes and are thus Markov processes themselves. Our purpose 
is to describe a broad range of optimal control problems in which these so- 
called cascade Markov decision processes (CMDP) admit explicit solutions 
[1], as well as problems in which dynamic programming is not applicable at 
all. 

Cascade processes are ideal in modeling games against nature. An epi¬ 
demic control system where infection rates vary in accordance with un¬ 
controllable factors such as the weather is one such case. They are also 
applicable in behavioral models of decision making where available choices 
at each step may be uncertain. For example, a behavioral decision-making 
problem called the ’’Cat’s Dilemma” first appeared in [7] as an attempt to 
explain ’’irrational” choice behavior in humans and animals where observed 
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preferences seemingly violate the fundamental assumption of transitivity of 
utility functions [6], [9]. In this problem, each day the cat needs to choose 
one among the many types of food presented to it so as to achieve a long¬ 
term balanced diet goal. However, the pet owner’s daily selection of food 
combinations presented to the cat is random. The cat’s feeding choice forms 
a controlled Markov chain, but the available foods themselves are contin¬ 
gent on the owner’s whim. Another example is found in dynamically-hedged 
portfolio optimization, where dynamic (stochastic) rebalancing of allocated 
weights can be modeled as a controlled Markov chain. However, what real- 
locations are possible may depend on the current prices of assets, which are 
themselves stochastic. Such MDP models have the advantage, for example, 
of being more realistic than their continuously-hedged counterpart, which 
have traditionally been studied using Gauss/Markov models on augmented 
state spaces [11], [8]. Other examples where CMDP are applicable include 
queuing systems where service times depend on the state of another queue 
and models of resource sharing where one process requires exclusivity and 
another doesn’t (e.g., determining the optimal sync rate for an operating 
system). 

While a cascade Markov process can be equivalently represented on the 
joint (coupled) state space as a non-cascade, the main purpose of this paper 
is to investigate solutions on decomposed state spaces. The main contribu¬ 
tions in doing so include: 

• Decoupled matrix differential equations as solutions to a variety of fully 
observable cascade problems involving optimization of the expectation 
of a utility functional, which are computationally easier to implement 
than their non-decoupled counterpart, and require solving of a one- 
point instead of a two-point boundary value problem. 

• Reduction of a partially observable cascade optimal control problem 
to a lower dimensional non-cascade problem (via a process we call di- 
agonalization ) that facilitates the use of standard optimization tech¬ 
niques on a reduced state space, thereby circumventing the ’’curse of 
dimensionality”. 

• Simpler analysis, via diagonalization, of a class of problems those that 
involve optimization of a non-linear function of expectation (such as a 
fairness or diversity index) and a full solution to a particular example 
of such singular optimal control problems. 

• A simple toy model for the dynamically-hedged portfolio optimization 
problem and solutions that can be easily generalized to computation¬ 
ally feasible algorithms for optimal allocation of large scale portfolios. 
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In addition to having the advantages of being able to efficiently repre¬ 
sent large state space Markov processes by factorization to simpler lower 
dimensional problems and thus derive computationally simpler solutions, 
our approach of working decomposed representations is generalizable to 
multi-factor processes, stochastic automata networks [10], and even quan¬ 
tum Markov chains and controls [5], [3]. 

The particular framework of Markov decision processes closely follows the 
assumptions and modeling of [1], which are characterized by finite or denu- 
merably many states with perfect state observations and affine dependence 
of transition rates on controls. The paper is organized as follows. A mathe¬ 
matical framework is first outlined, more details of which are in Appendix 
A. We then derive solutions to two classes of optimal control problems. In 
the first case the cost function is a the expectation of a functional, one that 
can be solved by dynamic programming requiring solution to a one point 
boundary value problem. The second class is the case where the cost func¬ 
tion can not be written as an expectation, a rather non-standard stochastic 
control problem but one that arises in applications requiring diversification 
(entropy) maximization or variance minimization and requires solution to 
two-point boundary value problems. In many cases the latter is a singular 
optimal control problem. We will then discuss toy examples in each class of 
problems: a portfolio optimization problem and animal behavior (decision¬ 
making) problem. More examples of portfolio optimization and their cascade 
solutions appear in the Appendix. 

2. Cascade Markov Decision Processes. 

2.1. Markov Decision Process Model. We use the framework of [1] for 
continuous-time finite-state (FSCT) Markov processes. We assume a prob¬ 
ability space {D,,T,F) and right-continuous stochastic processes adapted to 
a filtration F = {Ft)t&T on this space. An FSCT Markov process xt that is 
assumed to take values in the set of nstandard basis vectors in M”, 

has the following sample path (Ito) description: descriptions: 

m 

(2.1) dx = GjxdNj 

i=l 

( 2 . 2 ) 
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where Gi G are distinct"^, G” being the space of square n—matrices of 
the form F^i — Fu where Fij is the matrix of all zeros except for one in the 
i'th row and j'th column, and Ni are Poisson counters with rates A*. The 
resulting infinitesimal generator that governs the transition probabilities of 
the process is P G P”, the space of all stochastic n—matrices and is given 
by: 


m 

P = Y^GiXi 

i=l 

In a Markov decision process, the transition rates are allowed to depend on 
Pj—progressively measurable control processes u = {uiU 2 ---Up) in an affine 
accordance with^; 

p 

Xi = AjO T ^ ^ 
i=i 

so that the infinitesimal generator can be written as: 

m j p 

P{u') — 'y ^ Gi I Ajo T 'y ^ 

i=i \ j=i 

2.2. Cascade MDP Model. We are interested in the case where transition 
rates of xt G are themselves stochastic: specifically, they depend on 

the state of another Markov process, say, zt G We will call such a 

pair to form a Cascade Markov chain (CMC) In general, various levels of 
interactions between two processes xt and zt defines a joint Markov process 
yt = ztGxt that evolves on the product space {ei}'l^i x (see Appendix 

A) but we are specifically interested in CMCs where sample paths of zt and 
Xt have the following have the following Ito description (Proposition A.7, 
Appendix A): 

S 

(2.3) dz = HizdMi 

i=l 

m 

(2.4) dx{z) = Gi{z)xdNi{z) 

i=l 

the G'iS are not distinct, then one can combine the Poisson counters corresponding 
to identical G'iS to get a set of distinct Gfi. For example, GiydNi + GiydN 2 can be 
replaced by GiydN where dN = dNi +dN 2 , a Poisson counter with rate equal sum of the 
rates of the counters A'l, N 2 

^that is, we assume an affine dependence on controls 
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where Hi G G'", Gi{z) G G”" and the rates of Poisson counters Mi and Ni 
are I'i and with A* depending on the state of zt- Thus the infinitesimal 
generators P and C of xt and zt {P depends on zt and P{z) propagates the 
conditional probabilities of xt given z) are 

m 

Y^GMz) 

i=l 
s 

'^HiUi 

i=l 

In a Cascade Markov decision process (CMDP), we assume the rates 
Xi of counters Ni are allowed to additionally depend on 7"^—progressively 
measurable control processes u = {uiU 2 ---Up) in accordance with ^ 

p 

Xi{z) = Ajg + Ajo(^) T ^ 
i=i 

so that the conditional probability vector p{z, u) ^ of xt given z evolves as 


(2.5) P{z) = 

(2.6) G = 



which will be abbreviated as 

p 

(2.7) P{z,u) = Aq + A{z) + UjBj{z) 

j=i 

(2.8) p{z,u) = P{z,u)p{z,u) 

The CMDP model is completely specified by {Aq, A, Bj). 

The Admissible Controls, defining Ui The requirements on P{z,u) 
to be an infinitesimal generator for each z put constraints on the matrices 
AQ,A,Bj and impose admissibility constraints on the controls Uj, We will 
require Aq and A to be infinitesimal generators themselves (for each t and 
z) and the Bj to be matrices whose columns sum to zero (for each t and 
z). We also allow the controls to be dependent on z and x which will define 
the set of admissible controls U as the set of measurable functions mapping 

^Each term is, in additional, a function of time t but for clarity explicit dependence on 
t will not be specified in notation. 

^same as above. 
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the space {ei}l^i x to the space of controls such that the matrix 

with column 

p 

fj = AoCj + A{ek)ej + '^Ui{ek,ej)Bj{ek) 

i=l 

for j = l..n, k = l..r is an infinitesimal generator. Explicit dependence on 
t is omitted in notation above for clarity. 

2.3. Examples of CMDP. Two toy examples of CMDP that will be later 
discussed are outlined below. Some background on the terminology used in 
description of portfolio optimization is in Appendix C.l. 

2.3.1. Example 1: A Self-Einancing Portfolio Model. In this toy example 
on portfolio optimization^ we will assume that there is one bond and one 
stock in the portfolio, with the bond price being fixed at 1 and the stock 
having two possible prices 1 and —1/3. Thus the price vector takes values in 
the set {( 1 , 1 ),( 1 ,—|)}. Assume a portfolio that can shift weights between 
the two assets with allowable weights W of (0, 2), (—1, —1), (0, —2) so that 
the portfolio has a constant total position (of ^). Further, we allow only 
weight adjustments of +1 or —1 for each asset, and we further restrict the 
weight shifts to only those that do not cause a change in net value for any 
given asset price. The latter condition makes the portfolio self-financing. 

The resulting process can be modeled as a cascade MDP. Let zt be 
the (joint) prices of the two assets with prices ( 1 , 1 ), ( 1 , represented as 
states ei, 62 respectively. Let xt be the choice of weights with weights (0, 2), 
(—1, —1), (0, —2) represented as states ei, 62 , 63 respectively. Transition rates 
of Zt are determined by some pricing model, whereas the rates of xt which 
represent allowable weight shifts are controlled by the portfolio manager. 
The portfolio value v{zt,xt) can be written using its matrix representation, 
v{z,x) = z'^Vx, where V is 


(2.9) 




The portfolio manager is able to adjust the rate u of buying stock (which has 
the effect of simultaneously decreasing or increasing the weight of the bond). 
The resulting transitions of xt depend on zt (see Figure 1(a) ) and transition 

®See Appendix C, Section C.l for some basic definitions on Portfolio Optimization 
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matrices P{z) of the weights xt can be written as P{z) = A(z)+uB(z), where 
A{z) and B{z)aice: 


^0 0 0 


'-1 1 


A(ei) = Ul -I 1 ^(^2) = H 1 -10 




For P{z) to be a proper transition matrix we require admissible controls u 
needs to satisfy |n| < The portfolio manager may choose u in accordance 
with current values of xt and zt so that u is a Markovian feedback controls 
u{t,zt,xt). Note that this model differs from the traditional Merton-like 
models where only feedback on the total value vt is allowed. Note that it 
is the self-financing constraint that leads to the dependence on the current 
price Zt of the transitions of x, which allows us to model this problem as a 
cascade. 



Fig 1. Transition diagram of weight x{t) in the self-finaneing portfolio for various asset 
prices z{t) are shown in (a) and (b). States 61,62 of z{t) correspond to price vectors 
(l,l),(l,-l/3) respectively. Self-transitions are omitted for clarity. 


2.3.2. Example 2: The Cat’s Dilemma Model. As an example of a cas¬ 
cade MDP, we discuss the cat feeding problem introduced in Section 1. The 
feeding cat is represented by the process x{t) with four states 64 = Unfed, 
ei =Ate Meat, 62 =Ate Fish, 63 =Ate Milk. We assume a constant feed¬ 
ing rate /, and a constant ’’satisfaction” (digestion) rate s for each food, 
upon feeding which the cat always returns to the Unfed state. The Markov 
process z{t) G { 61 , 62 , 63 } represents availability of different combinations of 
food where 61 , 62,63 denote the combinations {Fish,Milk},{Meat,Milk} and 
{Meat,Fish} respectively. The food provider is unaffected by the cat’s eat¬ 
ing rate, and so we can model the process as a cascade with the transition 
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(a) z = ei (Fish/Milk) (b) z = 62 (Milk/Meat) (c) z = 63 (Meat/Fish) 

Fig 2. Transition diagram of cat feeding states x{t) in the Cat’s Dilemma for various food 
combinations z{t) are shown in (a), (b) and (c). Self-transitions are omitted for clarity. 


matrix P of x given by (see Figure 2.3.2), 


( 2 . 10 ) 


P{z, u) = Aq + A{z) + B{z)u 


where the control u{z,x) G j] represents the cat’s choice strategy (ex¬ 
treme values denoting strongest affinity for a particular food in the 
combination z), and with Aq,A{z) and B{z) given by 


(-a 

0 

0 

\ « 


A{ei) = 


B(ei) = 


0 

0 

\o 

i 


0 

— s 
0 
s 
0 
0 
0 
0 
0 
0 
0 
0 



^( 62 ) = 


B{e2) = 


0 

0 

\o 

(i 


0 

0 

0 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0 

0 

0 

0 



^(es) = 


5(63) = 


0 

0 

0 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0 

0 

0 

0 



3. Optimal Control Problem Type I : Expected Utility Maxi¬ 
mization. As alluded to in the introduction, the first category of optimal 
control problems on cascade MDPs is one where performance measure is the 
expectation of a functional, and hence linear in the underlying probabilities. 
We will primarily discuss the fully observable (full feedback), finite time- 
horizon case and derive a general solution as a matrix differential equation. 
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3.1. Problem Definition. Fix a finite time horizon T on the cascade MDP 
{zt,xt) defined in Section 2.2. and define the cost function r], 

(3.1) r]{u)=K f (z"’"{a)x{a) +'ijj{u{a))da + z'^(T)x(T) 

Jo 

where c, 4> are real-valued functions on the space M"'' x x that 

are represented by the real matrices L(t) and $(t) as c{t,z,x) = z'^'L{t)x 
and (j){t, z, x) = z'^^{t)x; and a (Borel) measurable function —)• M. If c 
is bounded the problem of finding the solution to 

(3.2) 77* = min77(u) 

new 

is well-defined and will be subsequently referred to as Problem (OCP-I). 
The corresponding optimal control is given by 

(3.3) u* = argminr/(u) 

u&A 

3.2. Solution Using Dynamic Programming Principle. 

Theorem 3.1. Let {zt,xt) be a cascade MDP as defined in Section 2.2 
with C,Aq,A and Bi as defined thereof. Let T > 0, and hl,fi,^and r] be as 
defined in section 3.1. Then there exists a unique solution to the equation 
(on the space of n x r matrices) 

p 

.^.4)= —KC — L — AfK — A^{z)K— min (y^ Uiz"^Bi{z)x + fi{u)) 

u(z,x)&U ^ 

2=1 

iF(2(^.5)= $(r) 

on the interval [0,T], where A^{z)K denotes the matrix whose j'th column 
is A{ej)K'^eJ (which can he more explicitly written as A^{z)Kzz'^, that 
is, the matrix representation of the functional x'^A'^{z)Kz). Furthermore, if 
K{t) is the solution to 3.4 then the optimal control problem OCP-I defined 
in (3.2) has the solution 

(3.6) 7/* = Ez^(0)iv:^(0)a;(0) 

p 

(3.7) u* = arg min (y^ z'^K"’"UiBi{z)x + ^{ui)) 

u(z,x)gU 

1=1 
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Proof. With z,x, rj as defined above let the minimum return function be 
k{t, z, x) = z'^{t)x,w\iGm K{t) is an n xr matrix, so that A:(0, z(0), x(0) = 
rj*. Using Ito rule for z'^K'^x 

s n 

d{z’^K^x) = Y, K^xdMi + z'^K^x + Y z^K^GixdNi 

i=\ i=l 

Since the process dNi — (A°q + Aio(z) + Yl^=i is a martingale 

equating the expectation to zero gives 

n 

E( z^K’^GjxdNj) = ¥.{g{t,x,z,u)dt) 
i=l 

K^xdMi) = ¥.{z^G'^K^x) 

i=l 

with g{t, X, z, u) = z’^K'^Aqx + z'^K'^A{z) + YTi=i K'^UiBi{z)x. Writing 
c(t, z, x) + ipiu) = f{t, z, x, u) and z'^G'^K'^x + g{t, x, z, u) = x, z, u), a 
simple application of the stochastic dynamic programming principle shows 
that 

z{t)'^K(t)'^x{t) + min(^(f, x, z, u) + /(t, z, x, u)) > 0 

U 

The minimum value of 0 is actually achieved by n*so that the inequality 
above must be an equality. Introducing notation A"^{z)K, we get precisely 
(3.4). Proof of uniqueness is identical to that in [1] Theorem 1. □ 

Note that the Bellman equation (3.4) is very similar to that of a single 
(non cascade) MDP with the additional term —KG representing the back¬ 
ward (adjoint) equation for the process z{t) and the appearance of z in the 
term for minimization which permits feedback of the optimal control u* on 
z in addition to x. The matrix K above is also known as the Minimum 
Return Function. The above solution is a single point boundary value 
problem instead of two-point. For small KG, the above decouples one col¬ 
umn at a time. This form is readily generalizable to multifactor MDPs as 
well. 

Corollary 3.2. (Quadratic Cost of Control^ Under the hypothesis of 
the above theorem, = uf then if Ui{t, z,x) = ^z"^{t)K'^{t)Bi{z)x{t) 

lies in the interior oflA then it is the optimal control. Otherwise the optimal 
control is on the boundary ofU. If the former is the case at every t G [0,r], 
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then equation (3.4) defining the optimal solution becomes (where the notation 
M-‘i q matrix is element-wise squared matrix): 

K = -KC -L-AlK- A^{z)K + \ {z)Ky^ 

i=l 

Corollary 3.3. (^No Cost of Control) Under the hypothesis of the above 
theorem, if'ip{ui) = 0 then the optimal control is at the boundary ofU. IfU 
is defined as the set { —at < \ui\ < a*} the optimal control is the bang-bang 
control Ui{t, z,x) = —aiSgn{z'^{t)Bi{z)x) and equation (3-4) defining the 
optimal solution becomes 

p 

k = -KC-l-aIk- a^{z)k + ; 

i=l 

3.3. Solution Using The Maximum Principle. The stochastic control prob¬ 
lem OCP-I can be formulated as a deterministic optimization problem (and 
hence also an open-loop optimization problem) using probability densities 
permitting the application of variational techniques. While this gives us 
no particular advantage over the DPP approach in providing a solution to 
OCP-I , understanding this formulation is useful for a broader class of 
problems. 

First we note that for the cascade MDP of Section 2.2 the transition 
matrices P{z, u) in (2.7) can be written in open-loop form 

p 

(3.8) Pi = Ai-\- BijDij 

1=1 

where Dij{u) is a diagonal matrix with diagonal [uj{ei,ei)...Uj{ei,eny\^, 

Ai = Aq-\- A{ei), Bij = Bj{ei) and Pi{u) = P{ej,u). Next, we can write 
evolution of the marginal probabilities cfit) = Pr{z(t) = Cj} and joint prob¬ 
abilities Pij{t) = Pr{z(t) = ei,x{t) = Cj} as the state equations 

(3.9) c = Cc 

Pi = PiPi+PiCi/Ci 

wheiepi{t) is the vector \pii{t) pi2{t)...pin{t)]'^, c{t) the vector [ci{t)...Cr{t)]'^. 

Now we are ready to show the equivalence of the variational approach to 
the Bellman approach for the problem (OCP-I) 
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Theorem 3.4. Let z G ,x G and C, Aq, A{z), Bi{z) be as 

defined in Section 2.2 and let the n—veetors pfit) (for i = l..r) and r—veetor 
c{t) satisfy (3.9) with Pi as defined by (3.8) for Dij arbitrary time dependent 
diagonal p—matriees, considered as controls. Then the minimization of 

[ + '^e^fi{Dij)pi)dt p'^ef^'^pfiT) 

i=l j=l i=l 

for nxr real matrices L and 4> and (Borel) measurable function ijj : M.^ ^ M., 
subject to the eonstraints that Pi G results in a choice for the element 
of Dij which equals the optimal control u*{eiek) of Theorem 3.1. 

Proof. Using (3.8) the Hamiltonian H and costate {q, s) for state equa¬ 
tions (3.9) for the minimization problem of the theorem become, assuming 
normality and stationarity of z(t), are 

r p 

H = A + qf BijDijPi + e^fi{Dij)pi + if Pi 

i=l j=l 

p p 

(3.10) qi = -{Ai + ^ BijDijf^qi - if fi>{Dij)e 

j=i i=i 

where h = ef L?" ^ fii = ef^^ and if (Dij) = 'ijj{uj{ei,x)). Introducing min¬ 
imization of H with respect to Dij we see that it is achieved by minimiz¬ 
ing ELi Yfj=i qf BijDijPi + e^i){Dij)pi. Noting that ?/>(Ai) is also diago¬ 
nal, simple observation shows that the above is precisely minimized when 

is minimized for each i (as pik > 0). The maxi¬ 
mum priniciple thus gives the following necessary condition for optimality: 

p 

qi = -Afqi - if - min(^ DfjBfqi + fi{Dij)e) 

i=i 

We note that since stationarity of z{t) was assumed, the above equation 
exactly corresponds to each column of the Bellman matrix equation (3.4) 
for K, of Theorem 3.1. (Note that the result is valid for non-stationary z{t) 
as well and algebraic manipulation shows {cijcf) terms to correspond to the 
— KC term in 3.4). □ 

Remark 3.5. Note that in the variational formulation, linearity of the 
Hamiltonian in the state variable p for the problem OCP-I resulted in com¬ 
plete decoupling of the state and eostate equations qi and pi thereby permit¬ 
ting an explicit solution identical to that of 3.4. However, if we we restrict 
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the set of admissible controls in to allow feedback on the state x but not 
on the state z in problem OCP-I we get a non-trivial variant problem, a 
partial feedback problem, in which case, one can see that in the analysis in 
Theorem 3.4 the minimization of + o^'4’iDij)pi, in 

general, does not lead to a decoupling of the state and costate equations. 

3.4. Example: A Self-Financing Portfolio. A toy model of portfolio op¬ 
timization is discussed as an example or problem OCP-I. Appendix B has a 
short background on portfolio theory and also discusses a variety of OCP-I 
problems on different portfolio models. Consider the problem of maximizing 
the expected terminal value v{T) of the portfolio for a fixed horizon T for 
the self-financing portfolio model of Section 2.3.1. With x, z, u, d, V, A, B, D 
as dehned thereof, we wish to maximize the performance measure given by 

r]{u, d) = E(u(T)) 

Using Theorem 3.1 we see the solution to this OCP-I problem is obtained 
by solving the matrix equation with boundary condition K{T) = —V 

(3.11) k = -KC - A^{z)K + ^ \K^B{z)\ + ^ \K^D{z)\ 

with the optimum performance measure and controls (in feedback form) 
given by 

rj* = z^{0)K^{0)x{0) 
u*{t,z,x) = —^sgn{z'^K{t)'^B{z)x) 

d*{t,z,x) = —^sgn{z'^K{t)'^D{z)x) 

with K{t) being the solution to (3.11). Some solutions for (3.11) and corre¬ 
sponding optimal controls are plotted for T = 15 is shown in Figure 3 for 
various initial conditions (mixes of the assets in the portfolio initially). Re¬ 
sults also show that as T —)• oo,the value of p* approaches a constant value 
of 1.24 regardless of the initial values 2;(0 ),x(0 ). That is the maximal possi¬ 
ble terminal value for the portfolio is 1.24. However, we do not see a steady 
state constant value for the optimal controls u*{z,x) and d*{z,x) and that 
near the portfolio expiration date, more vigorous buying/selling activity is 
necessary. If the matrix C were reducible or time-varying in our example, 
multiple steady-states are possible as T —)• oo and the initial trading activity 
will be more significant. 
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Optimal Portfolio Values for Initial Portfolio Mixes (z(0),x(0)] 



Fig 3. Minimum Return Function k{t,z,x), for the self-financing portfolio with maximal 
terminal wealth, shown for various values of z,x specified as {ei,ej) vectors. 


Two instances of simulation of application of the above optimal controls 
are shown in Figures 4 and 5. In the first case we see that one is able to 
benefit from x( 0 ) being in state 62 which is the one that corresponds to 
maximal value of the portfolio, but in which state no trading can take place. 
We can hold that value and it more than offsets any devaluation due to stock 
price decline since the stock is more probable to have a higher price than 
lower. In the second simulation, we are unable to achieve state x = 62 , which 
happens because this state can be attained only in the less probable case of a 
lower stock price. However, the optimal strategy still trues to maximize the 
portfolio value by forcing state x = 63 when the price is lower, but since this 
state is less likely, we need only switch to this sell-out strategy for a small 
portion of the time. The final value is most sensitive to the final trading 
activity. The optimal strategy allows us to maximize the portfolio value in 
all cases, and on the average, gives us the best value. 

Our approach of using a cascade model is a more realistic model for 
portfolio as it is dynamic hedged. Traditional Gauss-Markov models assume 
continuous hedging which is unrealistic. Our model can be easily extended to 
include features such as transaction costs, etc. Furthermore, by modeling it 
as a cascade, we have a computationally scalable solution. The computation 
time as a function of the dimensionality of the weight szt for a decomposed 
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Simulation of Optimal Strategy Execution for z(0)=e1, x(0)=e2 





Fig 4. Simulation 1 of optimal control for self-financing portfolio in 3.4 


representation and fully coupled representation (using Bellman equations on 
the joint process directly) for various expiration times are shown in Figure 
6 . We see that the solution on a coupled state space grows exponentially 
with the dimensionality of zt whereas our solution scales linearly. 

4. Optimal Control Problem Type II: Diversification Maxi¬ 
mization. The second category of cascade MDP problems are those of 
optimization of functionals that are non-linear with respect to the proba¬ 
bilities pij, such as portfolio diversification or fairness of choices in decision 
making. As alluded to in the introduction, these problems are often singular 
in the sense that the dynamic programming or maximum principle fail to 
give a solution, and we will explore this through an example. In general, this 
class of problems falls in the category where the performance measure to be 
optimized is a non-linear function of expectation. That is, for a non-linear 
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Simulation of Optimal Strategy Execution for z(0)=e1, x(0)=e2 
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Fig 5. Simulation 2 of optimal control for self-financing portfolio in 3.4 




(c) T = 100 


Fig 6. CPU time in seconds for asset/bond self-financing toy problem when the number 
of states of Zt (possible price combinations) increases, for different expiration times T = 
1,10,100. The decoupled solution scales with dimensionality whereas the coupled solution 
does not. 


f we want to minimize 


(4.1) 


r]{u) 



f{E{l{t,zt,Xt,u)))dt 
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where l{.) is some loss function. For example, rj{u) = {f{x) — E/(x))^ dt 

minimizes the variance of function / and r]{u) = (Exi — Ex 2 )^ dt specifies 
adherence to a particular state. 


4.1. Quadratic Problem with No Control Cost. We formulate a problem 
that is a particular case of (4.1). Given a cascade {zt,xt,U) with model 
{A, Bi) we define Problem OCP-II as the optimal control problem 


(4.2) 


V 


= min lim — 
ueu T^oo T 


(PtQPt + m^pt)dt 


where Q > 0, m is a vector and pt is the marginal probability vector of xt- 
We note that the stochastic dynamic programming principle is not directly 
applicable a problem of the form (4.1), and application of variational tech¬ 
niques at best gives us a two-point boundary value problem. Even if we did 
not have a cascade, the functional of the above form can result in singular 
arcs. To see this heuristically, consider the optimal control problem on a 
non-cascade defined as 


(4.3) p = {A + '^UiBi)p 

i 

1 

rj = lim - / p^Qpdt 

T—>-oo 1 Jq 

with U = {tti < Ui < bi}. The costate and Hamiltonian equations for this 
problem are 


q 

H 


-2Qp - A^q 



q 


Ap ^ q^UiBip p^Qp 
i 


so that 


dH 

dui 


q^Bip 


If q"^Bip = 0 for any finite time interval, then we have a singular arc so that 
the Hamiltonian provides no useful information. Characterizing solutions to 
such singular optimal control problems is notoriously hard. To see how we 
can get a singular arc in the case above, consider a simplification of (4.3) 
with A uB of the form A uficj for u G [a, 6].For example. 



'0 

0 

O' 


■ 0 ■ 


'O' 

B = 

0 

-1 

0 

= 

-1 


1 


0 

1 

0 


1 


0 


(-62 + 63)6^ = fse ^ 
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In steady state, one can show that 


p{u) = p{0) 


{ejp{0))u A+fi 
l + u{e^A+fi) 


where A+ is the Moore-Penrose inverse of A. If rjiu) were of the form (the 
’’usual” stochastic control case), then u* lies on the boundary of In the case 
r]{u) is of the form p^Qp (i.e. the non-linear stochastic control case), then it 
is possible that u* is in the interior of U. For the class of constant controls, 
if u* G Int{U) then one can show by computation that the corresponding 
{p{u),q{u) correspond to singular arcs. The above argument heuristically 
shows why the quadratic control problem OCP-II can be singular. 

For a non-cascade, however, the steady state optimal control problem 
reduces to a non-functional optimization problem, i.e. that of minimizing 
p^{u)Qp{u) + m^p{u). However, for a cascade, t] depends on the marginal 
probabilities of xt but it is the conditional probabilities xt\zt that evolve 
in accordance with p = Ap. In general, it is difficult to get an expression 
for p{u) of the steady state marginal probabilities of xt but we will below 
consider a special diagonalizable case where p{ui) satisfy Aip = 0 where p 
represents the marginal probability vector of xt- 


4.2. ’’Cat’s Dilemma” Revisited. In the model presented in Section 2.3.2, 
the combination of dishes available is random and the cat needs to optimize 
its selection strategy so as to get a balance of all three dishes. If we assume 
s = / = 1 we note that E(x 4 ) —| as t —)■ oo regardless of z or u. Hence, 
the best balance of foods is achieved when each of E(xi), E(a;2)) ^(xs) are as 
close as possible to |. Hence the problem can be defined as one of minimizing 
the performance measure 

1 

(4.4) ri{u) = lim — / \\&{Qx{t, z,u)) — m\\^ dt 

T^oo T Jq 

where |||| is the Euclidean norm on and Q,m defined as 


(4.5) 


/I 0 0 0\ 
0 10 0 

1 

1 

0 0 10 

; m = - 
0 

1 

Vo 0 0 Oyi 


VO/ 


4.3. A Binary Decision Problem. The cat’s dilemma can be generalized 
to a class of problems where one needs to make a choice given two possibil¬ 
ities at a time, so as to maximize the diversity of outcomes as a result of 
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one’s choices. If the total number of outcomes is N then binary possibilities 
are represented by the Markov process z{t) G , r = — 1), hav¬ 

ing generator C where, and the outcomes by the cascade Markov process 
x{t) G n = N -\-1, with transition matrix as in (2.10) with r and n 

dimensional analogs for Ao,A(z) and B(z). The admissibility set U of con¬ 
trols u{t, z, x) is the set of functions u : M"'' x {ei}^^^ x —)• [^, g] such 

that for each z and t the matrix P{t,z,u) is stochastic. (We can generalize 
to the situation where, for example, B{eij) = {ei — ej)'^en where Cij is choice 
(z,j) etc.). This cascade model has simpler representations as follows. We 
will assume s = f = 1. 

Proposition 4.1. The model described in Section 4-3 has the following 
properties. 

1. (Open loop w.r.t x ) For all t,z the dynamics of x{t) do not depend 
on the controls u{t, z, x) for all x ^ Cn 

2. (Open loop representation w.r.t z ) There exist rank one matrices 

Aj,Bj of the form fje^ and (open-loop) controls Uj \ —)■ [a, 6] 

for j = l..r such that the transition matrix (2.10) can he written as 

(4.6) P{t, Cj, u) = Aq + Aj + BjUj{t), for j = l..r 

3. (Triangular Representation) The marginal probabilities Cj{t) = Pr{z(t) = 
Cj} and pk{t) = Pr{x(t) = e^} satisfy the triangular equations 

(4.7) c{t) = Cc{t) 

r 

p{t) = {Ao + J2cj{Aj + BjUj{t)))p 
i=i 

where p{t) = \pi{t) ...pn{t) and c{t) = [ci(t) ...Cr{t) 

Proof. Since B{ej) is of rank one and of the form fje(( where fj is 

a column vector, the dynamics of x{t) depend only the value of control in 

state X = Cn and z. Thus w.lo.g write u{t, z, x) as u{t, z) instead. Open loop 
representation (4.6) w.r.t 2 : is made possible by using the parametrization 
Uj(t) = u(t,ej) with Bj = B{ej) and Aj = A{ej). The triangular represen¬ 
tation follows from Corollary B.2 (Appendix B) since {Aj + UjBj)ek = 0 
for j = l..r , k = l..{n — 1) and that the form of Aq in (2.10) above above 

implies that Pr(x = e^) is independent of Pr( 2 ; = ej) for j = l..r. □ 

The performance measure to maximize diversification of outcomes is (4.4) 
which can be written using notation introduced in Proposition 4.1 (with p{t) 
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explicitly written as p{t,u) instead), 

1 r'^ 

(4.8) vi'^) = li™ ^ / {Qp{t,u) — m)^{Qp{t,u) — m)dt 

T^oo T Jq 

Two classes of optimal control problems are discussed. 

4.4. Problem 1 : The Steady State Case. We assume z{t) to be station¬ 
ary®. Using the Given a fixed value To admissibility set Utq is restricted to 
the set of functions Uj{t),j = l..r that are constant for t > Tq, as per rep¬ 
resentation defined in (4.6). With ri{u) is in (4.8), the optimization problem 
is 

(4.9) rf = min r/(u), u* = arg min r]{u) 

uGUtq 

We will call this problem OCP-IIS 


Theorem 4.2. The solution to the optimization problem OCP-IIS is 
given by the solution to the quadratic programming problem 


rj = mm 

u 2 


-u^Hu -|- f'^u + k , subject to 


- e ^ lu ^ 

2 - “2 


where u G A, f = A^b, k = b'^b with matrix A and vector 

b depending on (ci,C 2 ...Cr) only, and if is the minimizing value for the 
above, then any function u{t) such that u{t) = rt® for t > Tq is an optimal 
control u*. 


Proof. The infinitesimal generator X{u) for x{t) defined in (4.7) is ir¬ 
reducible. Writing the unique time invariant solution to as p{u) a routine 
calculation shows that 

(4.10) p{u) = (ee"^ -|- X"’"{u)X{u))~^e 

For u G Uto we can write 

1 

Viu) = lim;^(/ {Qp{t,u) - mf {Qp{t,u) - m))dt 

T^OO 1 Jq 

-I- lim — / {Qp{u) - m)'^{Qp{u) - m))dt 
T^OO T Jrp^ 

= {Qp{u) — m)'^ {Qp{u) — m) 

®If the generator C of z{i) is irreducible then eventually z{t) will attain a time invariant 
distribution and hence the solution is no different. 
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Since the first integrand {Qp{t, u) — m)'^{Qp{t, u) — m)) is bounded and the 
second integrand {Qp{u) — m)'^{Qp{u) — m)) is independent of t. Using 
(4.10) write {Qp{u) — m)'^{Qp{u) — m)= jFp where p = \Au + B and A, b 
are per the statement. Expanding we get the quadratic programming 
equation. □ 


Claim 4.3. The quadratic programming equation (Theorem 4-. 2) has a 
solution T]* = 0 if and only if the corresponding minimizing value lies in 
the interior of the hypercube 


Remark 4.4. Theorem 4-2 can also be proved using explicit computation 
for the Cat’s Dilemma, with X{u) and p{u) given by 


X{u) 


p{u) 


-10 0 C3 (U3 + |) - C2 {u2 

0 -1 0 Cl (ui + I) - C3 (U3 

0 0 -1 C2 {U 2 + 5) - Cl {ui 

111 -1 


^ fcs {m + 5 ) - C 2 {U2 

- Cl (ui + 5) - C3 (U3 

[C2 {U2 + 1) - Cl {ui 


1 

2 

1 

2 

1 

2 


)] 

) 

)J 


1 

! 

2 

1 

2 


)! 

) 

) 


and A, b thus being computed as 



■ 0 

-C2 

C3 ' 


+ i(c3 + C 2 ) 
“6 + 4(ci + C 3 ) 

A = 

Cl 

0 

-C3 



.“Cl 

C2 

0 _ 


.“6 + i(c2 + Cl)_ 


Remark 4.5. We can solve the quadratic programming explicitly. The 
solutions G C where C is the closed cube [— 5 ]^. For the general case of 
dimensions r and n the results are similar. 


Case 1: When 0 < Cj < j = 1..3.In this case,p* = 0 and opti¬ 
mal solutions are given by the lines '^1 = 2 ^ (*^2 + 2 C 3 M 3 -|- |) ,U 2 = 
2 ^ (~ci -|- 2 C 3 U 3 -|- I) in the interior of C. Case 2: When Cj < | for all j, 
and Cj = 0 for some j,j = 1..3.In this case p* = 0 and the solutions are 
given by, for example, in the case { ci = 0 ,C 3 < | and C 2 < |} the set 
of lines U 3 = —'^ + ^■,'^2 = 3 ^“^™ interior of C but parallel to the 
faces. Case 3: When | < Cj < 1 for some j. Since H is singular, several 
local minima may exist. However, the isolines of global minima are attained 
along constant values of Cj in the case of | < Ci < 1 and the minimal values 
increase from 0 for Cj = | to 0.0408 for c* = 1. For example, if C 2 > 0 then 
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at most two global minima are attained at u = ( 0 , — 5 ) or u = ( 0 , 5 , 5 ) 

i.e. on the edges of C. If C 2 = 0 then the global minimum is attained on the 
line til = 0 , rt 3 = I 

4.5. Problem 2: The Time Varying Case. Again, we assume z{t) to be 
stationary. With the admissibility set U is set of functions Uj(t),j = l..r 
such that Uj{t) G [— 5 ) 5 ]- As per representation dehned in (4.6) and with 
r]{u) is in (4.8) the problem is 

(4.11) 7 ]* = min7/(ri), u* = argmin7y(tt) 

u£U u£U 

which we call OCP-IIT. In the cases where the steady state optimal control 
lies in the interior of U, these controls are also optimal within the class of 
time-varying controls. 

Proposition 4.6. In the cases described in example of Section 4-4 where 
the solution to the quadratic programming equation (Theorem 4-^) li^s in 
the interior of the hypercube solution defined in Proposition 4-2 

to OCP-IIS for any Tq is also a solution to OCP-IIT. 

Proof. In the cases of the example of Section 4.4 where the optimal 
controls are in the interior, optimal performance measure is t]* = 0. Since 
the performance measure r] defined in (4.4) always satishes rj > 0, thus in 
these cases a constant control is also optimal within the class of time-varying 
controls. And this holds for constant controls in the class Utq for any finite 
To (and thus by no means unique). □ 

4.6. Singularity Of Optimal Controls. The problems in Section 4.3 be¬ 
long to the category of singular control, and an analysis of singularity of 
optimal solutions presents a slightly more general approach to finding the 
solution to the time-varying problem (4.11) than the approach above. For 
this problem, using the representation of Proposition 4.1, the Hamiltonian, 
state and costate equations can be written as 

r r 

(4.12) H = {Qp - m)'^{Qp - m) + q^{Ao + ^ CjAj -|- ^ CjUjBj)p 

j=i j=i 

r 

(4.13) p = (Ao-h ^Cj(Aj + HjWj(t)))p 

3 =^ 

r r 

(4.14) q = -2{Qp -m) - {A'^ + J2 CjAj)q - (^ CjUjBj)q 

i=i i=i 
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However, we see from (4.12) that the costate and state equations are no 
longer decoupled, and thus trajectories {q,p) that minimize the Hamiltonian 
can not simply be obtained by solving an equivalent minimization of the 
individual costate/state equations. In fact, as shown below, we have the 
case of singular arcs, that is, trajectories (solutions) where Bip = 0. 
Such trajectories fail to give a minimization condition for H with respect 
to ttj.In such cases, the Maximum Principle at best remains a necessary 
condition failing to provide the optimal solution. Controls Ui such that the 
corresponding solutions {p,q) to the state/costate equations form singular 
arcs will be called singular controls. 

Proposition 4.7. For t > Tq, the solutions u*to the optimal control 
problem OCP-IIS that lie in the interior ofU are singular. 

Proof. As T —>■ oo, M*is a constant control and so p reaches an invari¬ 
ant distribution. Since the optimal trajectory must satisfy the state/costate 
equation, we see that q must be zero as well. Thus, from (4.12) we get by 
putting X{u) = X]j=i(^o + CjAj + CjUjBj) 

—2{Qp — m) — X'^{u)q = 0 

Expanding the above for the first (n—1) rows of X'^{u)q we get the equations 
q-a — qi = —2(pj — 2 ^) for i = l..n — 1. These give us the equations qi — qj = 
2{pi — Pj) for i,j = l..n — 1. The singularity conditions q^Bip = 0 expand 
to, by putting in the steady value of p{u), to qi — = 0 for i,j = l..n — 1. 

Since pi = Pj = ^ when u* is in the interior of U we see that the optimal 
solutions are singular. □ 

Thus, in the steady state case, optimal trajectories are singular. We now 
show that this is also the case for the time-varying case. 

Proposition 4.8. For the problem OCP-IIT, the value of the Hamil¬ 
tonian on singular arcs is zero. 

Proof. The state/costate/Hamiltonian are given by (4.12). Without loss 
of generality, let p(0) = e„. The state equations can be solved explicitly for 
Pn using pn = I — 2pn to yield Pn{t) = 5(1 + e“^*). Singular arcs satisfy 
q^’^Bjp = 0 which expands to Pn{qi — qj) = 0 for i,j = l..(n — 1 ) i.e. qi = qj 
for i,j = l..n — 1. From qi{oo) = 0 we get qi = qj or pi = pj for i,j = 
l..n — 1 using the costate equation. Using Y17=iPi = 1 we get the solution 
Pi{t) = ^(1 — e“^*) for i = l..(n — 1). Now plugging these into the costate 
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equations we can explicitly solve for qi, i = l..n for terminal condition 
qi{oo) = 0. Omitting details, plugging the solutions into the Hamiltonian, it 
can be readily seen that H = 0. □ 

Corollary 4.9. The solutions u*to the optimal control problem (4-11), 
sueh that limt_>.oo u*{t) lies in the interior ofU, are singular. 

Proof. In steady state, we see that the optimal trajectories (for which 
u is in the interior of U) yields H = 0 since {Cp — m)'^{Cp — m) = 0 and 
X{u)p = 0. From the Maximum Principle, this must be the minimizing value 
of H and since there is no explicit dependence of H on t this must be the 
value of H on optimal trajectories at all times. Hence, singular trajectories 
that satisfy the state/costate equations also minimize the Hamiltonian and 
so the entire optimal trajectory is singular. □ 

Now we show that singular solutions are also optimal for the case when 
optimal controls are in the interior of U. 

Proposition 4.10. For the problem OCP-IIT the value ofg as defined 
in (4-8) on singular arcs is zero. 

Proof. As in the proof of proposition 4.8, on singular arcs, ^ = q"^Bjp = 
0 for j = l..r give the conditions qi = qj for i,j = l...(n — 1). Evaluating 
for j = l..r and setting this to zero (details omitted) yields further 

the conditions pi = Pj for z,j = l...(n — 1). Next, evaluating ^(f^) for 
j = l..r and setting this to zero yields the same equations as in Case 1 
and Case 2 of (a generalized version of) the example presented in Section 
4.4 .That is, the equations corresponding to {Qp{u) — {Qp{u) — m) = 0 
where p{u) is given by (4.10). That is, rj = f). □ 

Note that due to the singular nature of the problem, the above analysis 
does not give us any information about the optimal control u*. However, 
we saw from the steady state analysis that a u such that is a constant 
value satisfying the quadratic programming problem (QPP) (Proposition 
4.2) after some finite time is an optimal solution. So if we initially start on 
a singular trajectory then we remain on it. Otherwise since u is bounded, 
we can’t jump immediately to the singular trajectories and so it will be a 
bang/bang like control till we transition to an optimal trajectory (though not 
necessarily constant) control - however, eventually this will become constant. 
Thus any control that becomes the constant value that is the solution to the 
QPP in finite time, and one that eventually steers the system onto a singular 
trajectory is an optimal control. 
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5. Conclusion. A framework for studying the class of problems where 
the dynamics of a controllable continuous-time finite-state Markov chain are 
dependent on an external stochastic process was introduced in this paper 
and two categories of optimal control problems were discussed. In the ’’type 
I” or ’’expected utility maximization” problems, techniques based upon dy¬ 
namic programming were used to provide solutions for a general class of 
problems in the form of a matrix differential equation. This result, proved 
in Theorem 3.1 using the stochastic dynamic programming, was alternately 
derived as using the maximum principle, and in the process we were able to 
see a more general applicability of the variational approach. These solutions 
were applied to a variety of toy examples in the area of dynamic portfolio 
optimization. Our factored solutions reduce storage requirements as well as 
computational complexity signihcantly. For example, in our representation, 
a coupled problem with r = 10, n = 1000 that would normally require stor¬ 
ing a 1000 X 1000 matrix needs at most ten 100 x 100 matrices, thereby 
providing a reduction by a factor of 10. This approach is also generalizable 
to multi-factor processes, with many interacting Markov chains and with 
even synchronizing transitions. 

Another category of problems, called ’’type II” or ’’diversification max¬ 
imization” problems with performance functionals that are non-linear in 
underlying state probabilities was discussed in the context of a cat feed¬ 
ing example. It was shown that this problem is singular in the sense that 
the maximum principle fails to provide an optimal solution, and alternative 
techniques were explored in the solution of this problem. 

Ongoing and future work in this area is focused on general techniques for 
such singular problems, and extending the class of problems to more com¬ 
plex ones such as multi-cascades (a set of multiple inter-dependent Markov 
chains), hybrid cascades (for instance, a discrete-state Markov chain with de¬ 
pendencies on continuous-state Gauss-Markov processes) and even decision 
processes in the context of quantum Markov chains or quantum controls. 
Computational considerations for large scale versions of the toy portfolio 
examples presented in this paper will also be investigated. 

In this paper only the singular control problem defined in Section 4.3 was 
analyzed. The general problem of minimizing a performance measure of the 
form ^ 

^ ^ i^P^Qp + c^p)dt -h ^p^{T)Sfp{T) + (jPjp 

on a cascade MDP where Q,Sf >0 needs to be investigated. For the time- 
invariant case, following the analysis in where it was shown that if a mini- 
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mizer of c^p is in the interior of the admissibility set U then it must define a 
singular arc, we would like to derive a similar result for the above case. We 
would further like to derive, for the time-invariant case, sufficient conditions 
for singular arcs to be optimal (i.e. analog of Proposition 4.10). 

Future work in this class of singular problems also involves other tech¬ 
niques such as variable transformations, as in [2], the method of singular 
perturbations (as in [4]), and numerical methods such as Chebyshev-point 
collocation techniques. 

APPENDIX A: MARKOV PROCESSES ON PRODUCT STATE 

SPACES 

We explore representations of a Markov Process yt that evolves on the 
product state space {ei}^^^ x The sample path y{t) can be written 

as the tuple {z{t),x{t)) where z{t) £ and x(t) G The corre¬ 

sponding stochastic processes zt and x^are the components of yt- The tran¬ 
sition matrix for xt may depend on z{t) and hence describes the propagation 
of the conditional probability distribution Px\z'- The dynamics of component 
marginal probabilities are not necessarily governed by a single stochastic 
matrix. Different degrees of coupling between xt and yt leads to a possible 
categorization of the joint Markov Process yt- 

Definition A.l. A Markov process yt on the state space x 

called tightly coupled or non-decomposable if there exist states 
{ei,ej) and (efc,e;) with i ^ k and j 7 ^ I having non-zero transition proba¬ 
bility. If all non-zero transition probabilities are between states of the form 
{ei,ej) to {ei,ek), or {ei,ej) to {ei,ej) then yt is called weakly-coupled or 
decomposable. 

Definition A.2. A decomposable chain on {ei}^^^ x where the 

transition probability from state {ei,ej) to {ei,ej) does not depend on j, for 
all i,l,j where 1 < < r and 1 < j < n, is called a Cascade Markov 

process'. 

Definition A.3. A cascade Markov process on x where 

the transition probability from state {ei,ej) to (ej,efc) does not depend on i, 
for alli,j,k where 1 < i < r and 1 < j,k < n, is called an Uncoupled 
Markov Process. 

^In this paper we mainly focus on Cascade Markov processes, and they are closely 
related to Markov-modulated Poisson processes (MMPPs) which have vast applications 
in traffic control, operations research and electronics and communications. 
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Thus, in a decomposable chain, the jumps in the two component processes 
are uncorrelated. However, the rates of the counters (and hence transition 
probabilities) in a component can depend on the state of the another compo¬ 
nent. In a cascade chain, the rates of the first component (zt) do not depend 
on the second component xt- In an uncoupled chain, the component pro¬ 
cesses Zt and Xt are completely independent. Decomposable Markov chains 
have functional transition rates, that is, the transition rates are state de¬ 
pendent but do not have any synchronous transitions. Non-decomposable 
Markov chains exhibit synchronous transitions: that is, transitions amongst 
states of Xt and zt can occur simultaneously. 

A.l. Sample Path and Transition Probability Representations. 

It will be convenient to represent sample paths y{t) using the Kronecker 
tensor product y{t) = z{t) ® x{t) instead of the tuple {z{t), x{t)). The state 
set y{t) then becomes standard basis for Following the model in (2.1) 

sample paths y{t) have the Ito representation 



(A.l) 


i=l 


where Gi G are distinct. Correspondingly, the infinitesimal generator 
P G Prn can be written as P = Yli=i GiXi where Aj is the rate of counter 
Aj.The following results relate decomposability of sample path and tran¬ 
sition probability representations to the various levels of couplings defined 
above. 

Proposition A. 4. Let the Markov process yt be defined on the state 
space {ei}l^^ with the Ito representation (A.l). Then for each (distinct) Gi, 
i = l..q (see notation defined in Appendix D), 

1. yt is a decomposable Markov process if and only if Gi can be written 
as either Gi = El ® Gf or Gi = Gl® Ef. 

2. If Gi can be written as Gi = El G G" or Gi = GI 0 In then yt is a 
cascade Markov process. 

3. If Gi can be written as Gi = Ir ® G^ or Gi = G\ ® In then yt is an 
uncoupled Markov process 

Proof. 1. To prove sufficiency, write (A.l) as dy = )(2;(8) 

x)dNi + Yl'j=qi+iiG) ® E'j){z (g) x)dNj . Since {El ® Gf ){z ® x) = 
Elz® G^x is a rank one tensor, zx'^is a rank one matrix with exactly 
one non-zero row. Thus jumps in Ni change x but not z. Conversely, 
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jumps in Ni that change both x and z must have d{zx'^) of rank > 1, 

i.e. Gi ^ Elz® G^x for any El and G”. 

2. In the decomposable change, transitions that change z but not x cor¬ 
respond to terms such as {G^ ® In){z®x)dNj = (G^z® In)dNj. Thus 
the transition G^ is driven by Nj only, regardless of x. Since G^ are 
distinct for distinct j, each transition in z is independent of the value 
of X. 

3. Follows by repeating the argument of (2) for the terms (/,. ® G^ ){z® 
x)dNi 

□ 

Proposition A.5. Let the Markov process yt he defined on the joint 
state space with infinitesimal generator P. Then, as per notation 

defined in Appendix D, 

1. Ifyt is decomposable, then P can he written in the form P = 

Bi'+Yl\=i Bl®Ell where Bf, Bl are matrices such that Bf G Pn 
and Bl G 

2. If yt is a cascade Markov process then P can he written as P = 

Bl ® Bf- + C ® In where G ^ Pr, where Bfare matrices such 
that Efii Bl G Pn 

3. If yt is an uncoupled Markov process then P can be written as 

(A.2) P = Ir®A + G®In 

where A ^ Pn and G € Pr ■ 

Proof. 1. For a decomposable chain from Proposition A.4 we can 
write (with qi = pi and pi + p 2 = q) P = YaLi{BI ® Gf A*) + 
Yl'j=pi+ii^^j^j^B'j) . The result follows from the fact that 
G Pd for any integers m and d, and by shifting the summation index 
in the second sum. 

2. Follows from Proposition A. 4(2) by setting G = Y'j=pi+i rioting 
that G £ Pr 

3. Follows from Proposition A. 4(1) as above. 

□ 

The transition matrix representation (A.2) above is not unique to an 
uncoupled Markov process. In fact, any Markov process yt on joint state 
space whose transition matrix P can be written in the form (A.2) 

is said to be diagonalizable. We will shortly see some sufficient conditions 
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for diagonalizability in the context of MDPs. An important property of 
diagonalizable Markov processes is that the marginal probabilities of the 
component processes evolve in accordance with stochastic matrices given by 
the diagonal decomposition, and in fact this condition is also sufficient to 
guarantee diagonalizability: 

Proposition A.6. Given a diagonalizable Markov process yt = zt® xt 
whose transition matrix has the diagonal representation (A. 2), the marginal 
probability distributions pz and px of the component processes zt and xt 
evolve in accordance with pz{t) = Cpz{t) and px{t) = Apx{t) respectively. 

Conversely, given a decomposable Markov process yt = zt® xt such that 
the marginal probability distributions pz andpx of zt andxt evolve on{eiYi=i 
and{ei}Yi accordance withpz{t) = Cpz{t) andpxif) = Apx{t) respectively, 
where A £ Pn and C £ Pr, then yt is diagonalizable with the representation 
given by (A. 2). 

From Propositions A.5 and A.4 we get the following: 

Proposition A.7. Let yt = zt® xt be a Markov process in r x n states 
where zt £ {ei}Yi and xt £ {eiY=i- Then sample paths of yt can be written 
as 

dyt = {zt ® dxt) + {dzt ® xt) + {dzt (8) dxt) 

If yt is decomposable, then the sample paths can be decomposed into 

m 

Zt ® dxt = Zt®'^ Gj{z)xtdNj{zt) 
i=i 

dzt®xt = Hi{z)ztdMi{zt) ® xt 

i=l 

dzt ® dxt = 0 

where Gj{z) £ GP,Hi £ for each z,x and Nj,Mi are doubly stochastic 
(Markov modulated) Poisson counters. Furthermore, if yt is a Cascade MG 
then we get the following decoupled Ito representation 

S 

dz = ^ HizdMi 

i=l 

m 

dx = Gi{z)xdNi{z) 

i=l 

Remark A.8. Ifyt is non-decomposable, the term dzt®dxt is non-zero, 
so we can not write sample paths in decoupled form. 


imsart-ssy ver. 2014/10/16 file: Cascade_MDP_Arxiv.tex date: September 2, 2015 


30 


M. GUPTA. 


APPENDIX B: DIAGONALIZABLE MARKOV DECISION 

PROCESSES 

B.0.1. Properties of Diagonalizable MDPs. If the MDP is diagonalizable, 
then some simplifications of the solutions presented above are possible. Once 
again consider optimal control problem (3.2) except that now the cascade 
is diagonalizable. Using notation of Section 3.3, the joint probabilities pi 
satisfy (assume stationarity of z{t)) 

p 

Pi — (^i “1“ ^ ^ Bij Dij)pi 
i=i 

Erom Proposition A.6 and the fact that the marginal probability vector 
of x{t) is Yll=iPi must have, for some stochastic matrix A, 

r p r 

(B.l) ^(Ai + '^BijDij)pi = A'^pi 

2=1 j = l 2=1 

Thus we have the following useful lemma: 

Lemma B.l. Let z € {ei}^^^ , x € andA{t,z),Bj{t,z),Uj{t,z,x), 

j = l..p, and a cascade MDP on z 0 x be as defined in Section 2.2. As be¬ 
fore, use shorthand Ai = A{t,ei), Bij = Bj{t,ei), and Uj{t,ei,x) as the 
diagonal matrix Dij. Then the resulting Markov process is diagonalizable if 
and only if there exists a stochastic matrix A(t) such that the joint proba¬ 
bilities written as vectors {pi{t) = \pii,Pi 2 ---Pin]"^fi = l-.r} where pik{t) = 
Pr{ 2 ;(t) = ei,x{t) = e^} at each t satisfy the equation (B.l), assuming that 
z{t) is stationarifi 

Corollary B.2. (Sufficient Conditions for diagonalizable MDP). The 

eascade MDP defined in the hypothesis of Lemma B.l is diagonalizable if any 
of following hold: 

1. A{t,z), Bj{t,z) and Uj{t,z,x) are independent of z, j = 1,2..p. That 
is, lA is restricted to the set of measurable functions on the space 

only (i.e. no feedback allowed on state z ) 

2. For each x G {efc}fc=i and t, the sum A{t, z) + %(^) a:)Bj{t, z) 

is independent of z for all admissible controls Uj. 

3. For each i,k such that the k'th row of Ai + BijDij does not 

vanish for all t and admissible controls Dij, the marginal probabilities 
pf{t) = Pr{ 2 ;(t) = Cj} andp^{t) = Pr{x(t) = 6^} are uncorrelated, 
i.e. pik{t) =pf {t)p^{t) 

® Similar equation can be derived for non-stationary z{t) but not needed in this paper 
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Proof. The first and second conditions are trivial. For the third, note 
that if Pik{t) = Pi {t)p^(t) then we can write the m'th row of the left hand 
side of (B.l) as in fully expanded form, using notation {M)ij for the 
entry of matrix M 

n r p 

k=l i=l j=l 
n r p 

= + BijDij)mkPi Pk 

k=l i=l j=l 

n r p r 

— 'y ^j Pi 'y T BijDij))jyik '^^^ Plk 
k=l i=l j=l 1=1 

Setting A = Yll=iPi YTj=i{-^i + which is readily verified to be a 

stochastic matrix, the result follows from Lemma B.l. □ 

B.0.2. Some Problems on Diagonalizable MDPs. Note that from (B.l) 
above, a diagonalizable MDP can be rewritten as a partial feedback problem, 
by possibly introducing matrices Ao{t), Bi(t) and controls Uj{t, x) such that 
A{t) = Ao(t) + ^)Bi{t)- Thus all optimal control problems on 

diagonalizable MDPs are in the category of partial feedback problems. 

Consider, once again the optimal control problem (3.2) except that now 
the MDP is diagonalizable. Simplified solutions are available in the following 
two cases. 

Theorem B.3. Let z G {ei}^^^ , x G and Aq, A,Bi,T, U, r],be 

as defined for the caseade MDP on z ® x of Theorem 3.1. In addition, let 
A,Bi and Id satisfy the hypothesis of Corollary B.2.1. Then if the cost func¬ 
tional L or terminal condition do not depend on z, the to the optimal 
eontrol problem defined in (3.2) has the solution 

rj* = Efc^(0)x(0) 

p 

u* = arg min (N^ Uik^BiX + fi{u)) 
u{x)&U 

2=1 

where k satisfies the vector differential equation 

p 

k = —A^^k — L'^ei— min (>^ mk^BiX +'ihiu)) 

2=1 

k{T) = 
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Proof. In this case since we have no dependence of AiBj or Uj on 2 
and neither that of L or the Bellman equation (3.4) reduces to the single 
state Bellman equation (See Theorem 1 in [2]) defined on the state space of 
x{t). Hence we can use a much simplified version of the Bellman equation 
to hnd the optimal control. Note, however, this does not necessarily imply 
eomplete independenee, in the sense that the marginal probabilities may still 
be correlated. □ 

Remark B.4. Note that in view of the introductory remark in Section 
B.0.2 the condition requiring satisfaetion of hypothesis of Corollary B.2.1 is 
not necessary for a diagonalizable MDP. 

Theorem B.5. Let z € ,x € and Ao,A,Bi,T,ilj,^,L,r],be 

as defined for the cascade MDP on z®x of Theorem 3.1. LetU be restricted 
to the set of measurable funetions on the space 'SA x{ei\'l^i(i.e. no feedback 
allowed on state z ), and further let the MDP satisfy the hypothesis of Corol¬ 
lary B.2.3. Using notation Ci{t) = Pr{ 2 ;(f) = ei},Ai{t) = A{t,ei), Bij{t) = 
Bj{t,ei) the optimal control problem defined in (3.2) has the solution 

rj* = c^(0)EA:^(0)x(0) 

P 'f' 

u* = arg min (S^ Uik"^CiBij)x + if{u)) 
u{x)clA ■‘f— 

l=\ 2 = 1 

where k satisfies the vector differential equation 

k = —Al^ k — (y^ CiAf)k — c 
2=1 
P 

- min (y^ Uik'^ (S^ CiBij)x + f:{u)) 
k{T) = 4>^c 

Proof. In this case, if we examine the Hamiltonian in (3.10) we note 
that in the term to be minimized becomes Ylj=i Ylk=iP-ik{ujkq'[Bij), 

(assuming no control cost) But since pik = PkCi where pk and c* are the 
marginal probabilities of x(t) = Ck and z{f) = respectively, this other¬ 
wise non trivial minimization becomes trivial since we can now interchange 
the summation order to write this sum as by writing Bj = . Y(a=i CiBij 
Yl'k=iPkYl^=i'^jk{Yli=i^iiQlBj)) and since Pfc > 0 we achieve minimiza¬ 
tion by choosing Ujk to minimize (X][=i Cj(g^Rj)). This then becomes the 
condition for the minimum in the costate equation as well, and hence we 
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have removed dependency of the costate equation on the state p and so we 
can solve the costate equation (i.e. this becomes the single state Bellman 
equation). □ 

APPENDIX C: PORTFOLIO OPTIMIZATION 

C.l. Background: Portfolio Value, Wealth and Investment. A 

portfolio consists of a finite set of assets (such as stocks or bonds), with the 
weight process xt denoting the vector of amounts (also called allocations or 
weights) of the assets. The price process zt denotes the vector of market 
prices of the assets We define the portfolio value v{t, z, x) as the net value 
of the current asset holdings for weights x and prices z. If x{t) and z{t) take 
values in finite sets of standard basis vectors, then v can be represented by 
the matrix V{t) as v{t,z,x) = z'^V{t)x. Using the Ito rule, we can write 

dv = dz^^V^x + z^V'^dx 

In a non self-financing model, depending on the current value of the portfolio, 
a weight shift will require buying/selling assets using an investment (or 
consumption, which is the negative of the investment). If s{t) represents the 
net investment into the portfolio up to time t, the incremental investment 
is the change in the portfolio value due to weight shift. Hence, 

(C.l) ds = z^V^dx 

Similarly, the wealth of the portfolio (i.e. its intrinsic worth) at time t is 
defined as w{t) = v{t) — s{t). So that the wealth represents the net effect of 
changes in asset prices, and we can write 

(C.2) dw = dz^V^x 

C.2. Self-Financing Portfolio Problem. We assume there are two 

stocks Si and S 2 whose prices each evolve independently on a state space of 
{—1,1}. Assume a portfolio that can shift weights between the two assets 
with allowable weights W of (2,0), (1,1), (0, 2) so that the portfolio has a 
constant total position (of 2). Further, we allow only weight adjustments of 
-|-1 or —1 for each asset, and we further restrict the weight shifts to only 
those that do not cause a change in net value for any given asset price. The 
latter condition makes the portfolio self-financing. 

The resulting process can be modeled as a cascade MDP. Let zt be the 
(joint) prices of the two assets with prices (—1, —1), (—1,1), (1, —1), (1,1) 
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represented as states ei, 62 , 63 , 64 respectively. Let xt be the choice of weights 
with weights ( 0 , 2 ), ( 1 , 1 ), ( 2 , 0 ) represented as states 64 , 62,63 respectively. 
Transition rates of zt are determined by some pricing model, whereas the 
rates of xt which represent allowable weight shifts are controlled by the 
portfolio manager. The portfolio value v{zt,xt) can be written using its 
matrix representation, v{z,x) = z'^Vx, where V is 


(C.3) 



0 

-2 

-2 


The portfolio manager is able to adjust the rate of increasing the first weight 
by an amount u and, independently that of decreasing the first weight by an 
amount d (which has the effect of simultaneously decreasing or increasing 
the weight of the second asset). The resulting transitions of Xt depend on 
Zt (see Figure C.2 ) and transition matrices P{z) of the weights xt can be 
written as P{z) = A{z) -\-uB{z) +dD{z)^ where A{z)^ B{z), D{z) are: 


/-I 1 0 \ 

A(ei) = ^ 1 -2 1 

Vo 1 - 1 / 

/-I 0 0\ 

B(ei) =1 -10 

Vo 1 0/ 

/o 1 0 \ 

-D(ei) = 0 -1 1 

V^O 0 - 1 / 


/-I 1 0 \ 

A(e2) = I 1 -1 0 

\ 0 0 0/ 

/-I 0 0 \ 

5(62) =1 0 0 

Vo 0 0 / 

/o 1 0\ 

D(e2) = I 0 -1 0 I 

V^O 0 0 / 


/O 0 0 \ 

A(e3) = I 0 -1 1 

\o 1 -V 

/O 0 0 \ 

5(63) = 0 -1 0 

Vo 1 0/ 

/o 0 0 \ 

5(63) = I 0 0 1 I 

V^o 0 - 1 / 


A(e4) - r 
5 ( 64 ) - 
- 0 ( 64 ) ^ 



1 

-2 

1 

0 0 

-1 0 

1 0 

1 0 

-1 1 

0 


For P{z) to be a proper transition matrix we require admissible controls w, d 
to satisfy |ti| , \d\ < The portfolio manager may choose u^d in accordance 
with current values of Xt and Zt so that w, d are Markovian feedback controls 
u{t^Zt^Xt) and d{t^zt^Xt). Note that this model differs from the traditional 
Merton-like models where only feedback on the total value Vt is allowed. 
Note that it is the self-financing constraint that leads to the dependence on 
the current price zt of the transitions of x which allows us to model this 
problem as a cascade. 


Consider the problem of maximizing the expected terminal value v{T) 
of the portfolio for a fixed horizon T for the above self-financing portfolio 
model 2.3.1. With x, u, d, A, B^ D as defined thereof, we wish to maxi¬ 
mize the performance measure given by 

r,{u,d)=Eiv{T)) 

Using Theorem 3.1 we see the solution to this OCP-I problem is obtained 
by solving the matrix equation with boundary condition K{T) = —V 

(C.4) k = -KC - A^{z)K + i \K^B{z)\ + i \K^D{z)\ 
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^ d ^ d ^ d 



Fig 7. Transition diagram of weight x{t) in the self-finaneing portfolio for various asset 
priees z{t) are shown in (a), (b) and (c). States 61,62,63,64 of z{t) correspond to price 
vectors (-1,-1), (-1,1), (1,-1), (1,1) respectively. Self-transitions are omitted for clarity. 


with the optimum performance measure and controls (in feedback form) 
given by 


r?* = z'^{0)K'^{0)x{0) 

u*{t,z,x) = —^sgn{z'^K{t)'^B{z)x) 

d*{t,z,x) = —^sgn{z'^K{t)'^D{z)x) 

with K{t) being the solution to (C.4). Some solutions for (C.4) and corre¬ 
sponding optimal controls are plotted for T = 1,15 in Figure 8 for various 
initial conditions (mixes of the assets in the portfolio initially). Results also 
show that as T —)• 00 ,the value of t]* approaches a constant value of 0.4725 
regardless of the initial values z(0),x(0). That is the maximal possible ter¬ 
minal value for the portfolio is 0.4725. However, we do not see a steady 
state constant value for the optimal controls u*{z,x) and d*{z,x) and that 
near the portfolio expiration date, more vigorous buying/selling activity is 
necessary. If the matrix C were reducible or time-varying in our example, 
multiple steady-states are possible as T —)• 00 and the initial trading activity 
will be more significant. 

C.3. An Investment-Consumption Portfolio Problem. An alter¬ 
nate model for portfolio allocation than discussed in the self-financing Port¬ 
folio example (Section ) is presented as a OCP-I problem in this section. 
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Fig 8 . Solution to problem C.2. Minimum Return Function k(t,z,x), optimal up controls 
u*{t,z,x) and down controls d*{t,z,x) for the self-financing portfolio with maximal ter¬ 
minal wealth. Figures(a) and (b) are for T = 1 and T = 15 respectively. Various {z,x) 
values are represented by the vectors {ei,ej). 


If we do not restrict the weight adjustments in the model of Section 2.3.1 
to cases which keep the value a constant, (i.e. we allow only weight adjust¬ 
ments of -|-1 or —1 for each asset, regardless of the current portfolio value) 
we get a non self-financing portfolio. The difference in the portfolio value 
as a result of weight shift must be the result of an equivalent investment or 
consumption. Once again, modeling this is as a cascade with zt and xt as in 
Section 2.3.1, the portfolio value matrix (C.3) is replaced by 

/-2 2 -2 2 \ 

(C.5) V = 1-2 0 0 2 

\-2 2 2 2 / 

As before, the portfolio manager can control the up and down rates u, d 
resulting in the transitions of xt (See Figure) described by the matrices 
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Fig 9. Transition diagram of weights x{t) for controls u,d in the investment/consumption 
portfolio. Self-transitions are omitted for clarity. 


P{z) = A{z) + uB{z) + dD{z) with 



/-I 0 0\ /O 1 0 \ 

B{z) =1 -10 D{z) = 0-1 1 

\ 0 1 0 / \0 0 - 1 / 

and admissibility condition |ix| , |(i| < Note in this cascade model the ma¬ 
trices A, B, D do not depend on z but we will see next that the performance 
measure does depend on z. 


C.3.1. Problem 1: Minimal Investment. We wish to minimize the total 
amount of investment up to a fixed horizon T. We can write a performance 
measure ri(u, d) that represents the net investment into the portfolio up to 
time T as 

v{u,d)=E{s{T)) 

where s{t) is the investment process (See Appendix C.l). Using (C.l) 
E{ds{t)) = E{z'^V^dx) = E{z^V^{A + uB + dD)x)dt 
Writing the matrix <!>(«, d) = V'^{A -\- uB -|- dD) 

fT 

r]{u,d) =E / z"^{t)^{u,d)x{t)dt 

Jo 

The goal then is to choose u, d so as to minimize r]{u, d) subject to u,d G 
U where the admissibility set lA is the set of past measurable functions tt(z, x) 
such that \u{z, x)| < ^ for each x. Using Theorem 3.1 we see the solution 
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to this OCP-I problem is obtained by solving the matrix equation with 
boundary condition KiT) = 0 

(C.6) k = -KC - A^{K + V) + ]^ \b'^{K + V)\ + ^\d'^{K + ■^^)| 

(where the notation \M\ for a matrix M above represents the element-by- 
element absolute value of a matrix) with the optimal performance measure 
and controls (in feedback form) given by 

rj* = z'^{0)K{0)x{0) 

u*{t,z,x) = -^sgn{z'^{K{t)+ V)'^Bx) 

d*{t,z,x) = -^sgn{z'^{K{t)+ V)'^Dx) 

where K{t) is the solution to (C.6). Some solutions to (C.6) and correspond¬ 
ing optimal controls are plotted for T = 1,10 in Figure 10(a) and 10(b). 
Results also show that as T —)• oo,the value of r]*/T approaches a constant 
value of —0.535 regardless of the initial values z(0), x(0) and in this case we 
see that the optimal controls u*{z,x) and d*(z,x) expressed in matrix form 
{u*{z,x) written as z'^u*x etc.) 



(the values of u*{z,e^) and d*{z,ei) are immaterial as they do not impact 
the dynamics). This means that one can expect a constant cash flow of 
0.535 by the above strategy, and that this value is maximal. Note also that 
the optimal controls do depend on z and so the resulting weight and asset 
probabilities are not independent. 

C.3.2. Problem 2 : Maximal Terminal Wealth. In this case the perfor¬ 
mance measure that needs to be maximized is given by 

r}{u,d) =¥,{w{T)) =¥, f z'^{t)C'^V'^x{t)dt 

Jo 

where w{t) is the wealth process (Appendix C.l) for u,d £ U as above. 
Again, from Theorem 3.1 the solution to this OCP-I problem is obtained 
by solving the matrix equation with boundary condition K{T) = 0 

(C.7) k = -{K - V)C - A^K + ^ \b'^K\ + ^ \D^K\ 
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Fig 10. Solution to problem C.3.1. Minimum Return Function k{t,z,x), optimal up con¬ 
trols u*{t,z,x) and down controls d*(t,z,x) for the self-financing portfolio with maximal 
terminal wealth. Figures(a) and (b) are for T =1 and T = IQ respectively. Various {z,x) 
values are represented by the vectors {ei,ej). 


whose solution K{t) gives the optimal performance measure and controls as: 

rj* = z'^{0)K{0)x{0) 
u*{t,z,x) = —^sgn{z'^K’^{t)Bx) 

d*{t,z,x) = —-sgn{z'^K’^{t)Dx) 

Some numerical results for the above problem with V as in (C.5) are plotted 
for r = 1,10 in Figure 11 (a) and 11 (b). Results also show that as T —>■ 
oOjthe value of rj*/T approaches a constant value of —0.533 regardless of 
the initial values z(0), x(0) and in this case we see that the optimal controls 
u*{z,x) and d*{z,x) expressed in matrix form {u*{z,x) written as z'^u*x 
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etc.) are 
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time, t time, t time, t time, t 

Fig 11 . Solution to problem C.3.2. Minimum Return Function k{t,z,x), optimal up con¬ 
trols u*{t,z,x) and down controls d*{t,z,x) for the self-financing portfolio with maximal 
terminal wealth. Figures(a) and (b) are for T =1 and T = IQ respectively. Various {z,x) 
values are represented by the vectors {ei,ej). 


C.3.3. Problem 3 - Minimal Investment with Partial Feedback. In the 
investment/consumption model, the control matrices A, B, D do not depend 
on 2 ;. As a result one may be tempted to think that a partial feedback 
optimization problem, i.e. where the controls are allowed to depend on x 
but not z would give the same optimal performance. However, one sees from 
Theorem 3.1 the solution to the minimal investment case is obtained by 
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solving the matrix equation subject to K{T) = 0 

(C. 8 ) pz = Cpz', PziO) = Ez(0) 

K = -KC-A^{K + V) + ^\B^{K + V){erP^)\ 

+ '^\D^{K + V){erp^)\ 

where pz is the probability vector for z. And the optimal performance and 
controls are given by 

p* = p'^{0)K^{0)x{0) 
u*{t,x) = -^sgn{{erP^{t)){K{t)+ VfBx) 

d*{t,x) = -^{{erp'^{t)){K{t)+ VfDx) 

where K{t),pz{t) are solutions to C. 8 . The best performance in this case 
is worse than that in the full feedback case, as indeed shown by numerical 
simulation as in 12(a), (b) for T = 1,10. Comparing with the respective 
minimum return functions of the full feedback case, the steady state case 
maximal cash flow rate is only 0.22 compared to 0.533. 

C.4. Some Variations on Portfolio Problems. Some variants of the 
examples presented here and in Section 3.4 include the following: 

C.4.1. Utility Functions.and Discounting. In traditional portfolio opti¬ 
mization problems, one minimizes K{U{s{T)))o maximizes K{U{w{T))) where 
[/(.) is a non-decreasing and concave function, called the utility function. In 
the above examples, for simplicity of demonstration of the MDP techniques, 
we assumed U(C) = C. Utility functions are chosen based upon risk pref¬ 
erences of agents and the financial environment, and some standard ones 
include the U{C) = ^ (with 7 < 1) or U{C) = logC. Furthermore, one 

may wish to optimize the discounted value i.e E e~°‘^U{w{t))dt for some 
a > 0 instead. The solutions to optimization problems of minimum invest¬ 
ment and maximum wealth in these cases are identical to (C. 6 ) and (C.7) 
with V replaced by e~°‘^U{V). 

C.4.2. Value Payoff Functions. The particular model we chose led to a 
value payoff V as in (C.5) though the problems presented above are com¬ 
pletely generic with respect to V in that any other value of V would work 
as well. In that case we will have different mappings of the states 61 , 62,63 
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T=1 


T=10 



Fig 12. Solution to problem C.3.3. Minimum Return Function k{t,z,x), optimal up con¬ 
trols u*{t,z,x) and down controls d*(t,z,x) for the self-financing portfolio with maximal 
terminal wealth. Figures(a) and (b) are for T =1 and T = IQ respectively. Various {z,x) 
values are represented by the vectors {ei,ej). 


of X to the weights and that of ei, 62 , 63 , 64 of z to asset prices, but it is only 
the value matrix V that appears in any of the solutions and these mappings 
are immaterial. 

C.4.3. Transaction Costs. If buying/selling of assets incurs a transaction 
cost then every weight shift is associated with a cost. This can be modeled 
in terms of the control costs. We can see that a value of u = — | represents 
the case of a minimal rate of buying the first asset, while u = ^ represents a 
maximal rate of buying the first asset. Likewise, the values d=—^tod=^ 
represent the range of the rates of selling the first asset. Hence a reasonable 
metric for the transaction costs would be (n + + (d + For example, 

a performance measure like {a > 0 ) 

r/(M, d)=E f Q!((u + ^)^ + (d + + E(C/(s(r))) 

Jo ^ ^ 
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APPENDIX D; SUMMARY OF NOTATIONS AND SYMBOLS 

A stochastic basis (D, T", F, F) is assumed where (D, T^ F) is a probability 
space and F a filtration {Ft)t&T on this space for a totally ordered index 
set T (C M+in our case). All stochastic processes are assumed to be right 
continuous and adapted to F. 

F A nitration {Ft)t&T on (DjT', F) where 

T is a totally ordered index set 
The space of square matrices of dimen¬ 
sion n of the form F^i — Fu where Fij 
is the matrix of all zeros except for one 
in the i'th row and j'th column 
E" The space of diagonal n x n matrices 

with only I’s or O’s 

In n X n identity matrix, G E"^ 

F” The space of all stochastic matrices of 

dimension n 

The set of n standard basis vectors in 

(p{t) A real-valued function 0 on M"*" x 

will be written as the vector 
G as = (fP^{t)x where 

e {eJLi 

$(t) A real-valued function (j) on M"*" x 

x{ei}'l^-^ written as the r x n 
real matrix $(t) as z, x) = z'^^{t)x 
where z G {eiYi=i ^'ad x G {e^Yi 
A^{z)K Denotes the matrix whose j'th column 
is A{ej)K'^eJ which can be more ex¬ 
plicitly written as Yl^A^{z)Kzz'^ 

\M\ For a matrix M represents the element- 

by-element absolute value of a matrix 
M'^ For a matrix M represents the element- 

by-element squared 
The r—vector [1 1...!]^ 
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